Dependency Parsing: Past, Present, and Future

نویسندگان

  • Wenliang Chen
  • Zhenghua Li
  • Min Zhang
چکیده

Dependency parsing has gained more and more interest in natural language processing in recent years due to its simplicity and general applicability for diverse languages. The international conference of computational natural language learning (CoNLL) has organized shared tasks on multilingual dependency parsing successively from 2006 to 2009, which leads to extensive progress on dependency parsing in both theoretical and practical perspectives. Meanwhile, dependency parsing has been successfully applied to machine translation, question answering, text mining, etc. To date, research on dependency parsing mainly focuses on data-driven supervised approaches and results show that the supervised models can achieve reasonable performance on in-domain texts for a variety of languages when manually labeled data is provided. However, relatively less effort is devoted to parsing out-domain texts and resource-poor languages, and few successful techniques are bought up for such scenario. This tutorial will cover all these research topics of dependency parsing and is composed of four major parts. Especially, we will survey the present progress of semi-supervised dependency parsing, web data parsing, and multilingual text parsing, and show some directions for future work. In the first part, we will introduce the fundamentals and supervised approaches for dependency parsing. The fundamentals include examples of dependency trees, annotated treebanks, evaluation metrics, and comparisons with other syntactic formulations like constituent parsing. Then we will introduce a few mainstream supervised approaches, i.e., transition-based, graph-based, easy-first, constituent-based dependency parsing. These approaches study dependency parsing from different perspectives, and achieve comparable and state-of-the-art performance for a wide range of languages. Then we will move to the hybrid models that combine the advantages of the above approaches. We will also introduce recent work on efficient parsing techniques, joint lexical analysis and dependency parsing, multiple treebank exploitation, etc. In the second part, we will survey the work on semi-supervised dependency parsing techniques. Such work aims to explore unlabeled data so that the parser can achieve higher performance. This tutorial will present several successful techniques that utilize information from different levels: whole tree level, partial tree level, and lexical level. We will discuss the advantages and limitations of these existing techniques. In the third part, we will survey the work on dependency parsing techniques for domain adaptation and web data. To advance research on out-domain parsing, researchers have organized two shared tasks, i.e., the CoNLL 2007 shared task and the shared task of syntactic analysis of non-canonical languages (SANCL 2012). Both two shared tasks attracted many participants. These participants tried different techniques to adapt the parser trained on WSJ texts to out-domain texts with the help of large-scale unlabeled data. Especially, we will present a brief survey on text normalization, which is proven to be very useful for parsing web data. In the fourth part, we will introduce the recent work on exploiting multilingual texts for dependency parsing, which falls into two lines of research. The first line is to improve supervised dependency parser with multilingual texts. The intuition behind is that ambiguities in the target language may be unambiguous in the source language. The other line is multilingual transfer learning which aims to project the syntactic knowledge from the source language to the target language.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An improved joint model: POS tagging and dependency parsing

Dependency parsing is a way of syntactic parsing and a natural language that automatically analyzes the dependency structure of sentences, and the input for each sentence creates a dependency graph. Part-Of-Speech (POS) tagging is a prerequisite for dependency parsing. Generally, dependency parsers do the POS tagging task along with dependency parsing in a pipeline mode. Unfortunately, in pipel...

متن کامل

تأثیر ساخت‌واژه‌ها در تجزیه وابستگی زبان فارسی

Data-driven systems can be adapted to different languages and domains easily. Using this trend in dependency parsing was lead to introduce data-driven approaches. Existence of appreciate corpora that contain sentences and theirs associated dependency trees are the only pre-requirement in data-driven approaches. Despite obtaining high accurate results for dependency parsing task in English langu...

متن کامل

برچسب‌زنی خودکار نقش‌های معنایی در جملات فارسی به کمک درخت‌های وابستگی

Automatic identification of words with semantic roles (such as Agent, Patient, Source, etc.) in sentences and attaching correct semantic roles to them, may lead to improvement in many natural language processing tasks including information extraction, question answering, text summarization and machine translation. Semantic role labeling systems usually take advantage of syntactic parsing and th...

متن کامل

Word-based Japanese typed dependency parsing with grammatical function analysis

We present a novel scheme for wordbased Japanese typed dependency parser which integrates syntactic structure analysis and grammatical function analysis such as predicate-argument structure analysis. Compared to bunsetsu-based dependency parsing, which is predominantly used in Japanese NLP, it provides a natural way of extracting syntactic constituents, which is useful for downstream applicatio...

متن کامل

Urdu Dependency Parser: A Data-Driven approach

In this paper, we present what we believe to be the first data-driven dependency parser for Urdu. The parser was trained and tuned using MaltParser system, a system for data-driven dependency parsing. The Urdu dependency treebank (UDT) is used for training and testing of the Urdu dependency parser, is also presented first time. The UDT contains corpus of 2853 sentences which are annotated at mu...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014